Genes 2012, 3, 35-61; doi:10.3390/genes30 10035 



OPEN ACCESS 



genes 

ISSN 2073-4425 

www.mdpi.com/journal/genes 

Article 

Comparative Genomics of Aeschynomene Symbionts: Insights 
into the Ecological Lifestyle of Nod-Independent 
Photosynthetic Bradyrhizobia 

Damien Mornico 1,2 , Lucie Miche \ Gilles Bena 1,3 ? Nico Nouwen , Andre Vermeglio 4 , 
David Vallenet 2 , Alexander A.T. Smith 2 , Eric Giraud \ Claudine Medigue 2 
and Lionel Moulin '* 

1 IRD-LSTM, UMR1 13, Campus de Baillarguet, 34398 Montpellier cedex 5, France; 
E-Mails: dmornico@genoscope.cns.fr (D.M.); lucie.miche@univ-provence.fr (L.M.); 
gilles.bena@ird.fr (G.B.); nico.nouwen@ird.fr (N.N.); eric.giraud@ird.fr (E.G.) 

2 LABGeM, CEA-Genoscope & CNRS-UMR8030, 91057 Evry, France; 
E-Mails: vallenet@genoscope.cns.fr (D.V.); asmith@genoscope.cns.fr (A.A.T.S.); 
cmedigue@genoscope.cns.fr (CM.) 

3 LMBM, Faculte des Sciences, Universite Mohammed V-Agdal, Av. Ibn Batouta BP 1014, 
Rabat, Morocco 

4 Laboratoire de Bioenergetique Cellulaire, CEA Cadarache, DSV, IBEB, 13108 Saint-Paul-lez-Durance, 
France; E-Mail: andre.vermeglio@cea.fr 

* Author to whom correspondence should be addressed; E-Mail: lionel.moulin@ird.fr; 
Tel.: +33-4-6759-3763; Fax: +33-4-6759-3802. 

Received: 16 September 2011; in revised form: 8 November 2011 / Accepted: 23 November 2011 / 
Published: 21 December 2011 

Abstract: Tropical aquatic species of the legume genus Aeschynomene are stem- and 
root-nodulated by bradyrhizobia strains that exhibit atypical features such as photosynthetic 
capacities or the use of a nod gene-dependent (ND) or a nod gene-independent (NI) 
pathway to enter into symbiosis with legumes. In this study we used a comparative 
genomics approach on nine Aeschynomene symbionts representative of their phylogenetic 
diversity. We produced draft genomes of bradyrhizobial strains representing different 
phenotypes: five NI photosynthetic strains (STM3809, ORS375, STM3847, STM4509 and 
STM4523) in addition to the previously sequenced ORS278 and BTAil genomes, one 
photosynthetic strain ORS285 hosting both ND and NI symbiotic systems, and one NI 
non-photosynthetic strain (STM3843). Comparative genomics allowed us to infer the core, 
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pan and dispensable genomes of Aeschynomene bradyrhizobia, and to detect specific genes 
and their location in Genomic Islands (GI). Specific gene sets linked to photosynthetic and 
NI/ND abilities were identified, and are currently being studied in functional analyses. 

Keywords: rhizobia; genomics; Bradyrhizobium; Aeschynomene; nod-independent 



1. Introduction 

Rhizobia are soil bacteria able to develop nitrogen-fixing symbioses with legumes, by the formation 
of organ-like structures on plant roots called nodules. Following differentiation into bacteroids, they 
transform atmospheric dinitrogen into ammonium that they provide to the plant in exchange for 
carbohydrates. Rhizobia are polyphyletic and spread among the classes Alphaproteobacteria and 
Betaproteobacteria [1]. Bradyrhizobium is a genus within alphaproteobacteria comprising many symbiotic 
species phylogenetically spread into three main clades: the B. japonicum clade, the B. elkanii clade and 
the photosynthetic Bradyrhizobium (PB) clade [2,3]. 

The photosynthetic bradyrhizobia clade is composed of symbionts specific to the tropical 
Aeschynomene legume species, on which they form root but also stem nodules. Stem nodulation is a 
plant-dependent ability quite unusual in legumes, described in only five genera (Aeschynomene, 
Discolobium, Neptunia, Sesbania and Vigna) and is often associated to the plants' aquatic life in wet or 
temporarily flooded habitats [4,5]. The infection process in Sesbania rostrata and Aeschynomene 
species during water-stress conditions occurs by "crack entry" (a process different from root hair 
curling) by intercellular infection of epidermal fissures (cracks) generated by the emergence of lateral 
roots [6,7]. The nodulation process between Sesbania rostrata and its specific symbiont Azorhizobium 
caulinodans occurs via a classic Nod-factor dependent process [7,8]. The symbiotic interaction 
between Aeschynomene and photosynthetic Bradyrhizobium, however, presents several particularities. 
First, by definition, PB possess a photosynthetic activity [9] that might play a role in symbiosis, either 
by conferring a selective advantage for bacterial survival ex planta (as a supplementary source of 
energy), and/or by facilitating bacterial infectivity and symbiotic effectiveness [10,11]. Second, some 
photosynthetic bradyrhizobia lack the canonical nodulation genes (genome -based analyses on ORS278 
and BTAil) and use an unknown nod gene-independent pathway to form nodules on some 
Aeschynomene species [12,13]. Aeschynomene species are categorized into three cross-inoculation (CI) 
groups [14,15], linked to specific features of their symbionts mentioned above: their photosynthetic 
ability, their capacity to form stem nodules, and finally their use of a nod gene-dependent (ND) or nod 
gene-independent (NI) symbiotic pathway. The CI group I includes Aeschynomene species nodulated 
only on their roots by non-photosynthetic bradyrhizobia, stem nodulation being restricted to CI group 2 
and 3. CI group 3 is characterized by plant species nodulated only by PB lacking nodulation (nod) 
genes (NI, as strains ORS278 and BTAil symbiotic of A. indica and A. sensitiva), while CI group 2 
species are nodulated by both photosynthetic and non-photosynthetic strains (as A. afraspera) [14]. It 
is relevant to note that photosynthetic strains able to nodulate CI group 2 (as ORS285) nodulate also CI 
group 3, as they carry both the ND and NI systems. Indeed an isogenic nodA minus mutant of ORS285 
lost its symbiotic ability on CI group 2 but was still able to nodulate efficiently CI group 3 [13]. In a 
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recent study, Miche and colleagues [16] sampled many rhizobial strains on CI group 3 Aeschynomene 
species in Central and South America. All isolates belonged to the PB clade, except a new group of 
non-photosynthetic strains phylogenetically placed at an intermediate position between the photosynthetic 
clade and the B. japonicumlB. elkanii clades, and nodulating the Aeschynomene CI group 3 (stem and 
roots) by the Nl pathway (see Figure 1, STM3843 clade). 

Figure 1. Maximum likelihood recA phylogeny of Aeschynomene bradyrhizobial 
symbionts indicating the position of genome-sequenced strains. Adapted from [16]. 
Numbers at nodes are bootstrap percentages from 100 replicates. Roman numbers indicate 
clades defined from AFLP analyses and recA phylogeny in [16]. Abbreviations: 
PB: photosynthetic bradyrhizobia; Sep: separate group with no affiliated number; 
ND/NI: nod gene dependent/independent symbiotic pathway. 
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Research is currently under way to identify the genetic bases of the Nl symbiotic pathway. 
Traditional genetic approaches using a random Tn5 transposon insertion allowed the identification of 
mutants affected in nodule development and nitrogen fixation [12]. However, no complete nodulation 
deficient mutants were identified during this screening, underlining a potential functional redundancy 
in genes involved in the early steps of this interaction, and the need for different approaches to 
identify them. 
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To better understand how photosynthetic bradyrhizobia have adapted to their particular niches 
(aquatic, nodules on Aeschynomene), we developed in this study a comparative genomics approach on 
several strains with a focus on the nod-dependent/nod-independent and photosynthetic/non-photosynthetic 
bacterial lifestyles. To reach this aim, we sequenced and compared the genomes of (i) five nod gene 
independent (NI) PB strains of CI group 3 from different sub-branches of the PB clade (ORS375, 
STM3809, STM3847, STM4509 and STM4523) and included the previously sequenced ORS278 and 
BTAil genomes; (ii) a PB strain hosting both ND and NI genetic bases (ORS285, CI group 2); 
(iii) a NI non-photosynthetic strain of CI group 3 (STM3843). Comparative genomics analyses 
allowed us to (i) infer the genomic diversity and evolutionary dynamics of PB genomes (core and 
accessory gene sets, Genomic Islands); (ii) identify gene sets potentially involved in symbiosis and 
photosynthesis. The latter are currently being studied by functional analyses. 

2. Results and Discussion 

2.1. Genomic Features of Bradyrhizobial Genomes 

This study compares 10 Bradyrhizobium genomes, including nine genomes of Aeschynomene 
symbionts, and the genome of B. japonicum USDA110. Three complete genomes were already 
published {Bradyrhizobium sp. BTAil, Bradyrhizobium sp. ORS278 [13], B. japonicum USDA110 [17]). 
Seven strains were sequenced for this study, selected as representatives of almost every phylogenetic 
branch of the photosynthetic bradyrhizobia clade (isolated from worldwide Aeschynomene species) as 
established in the recA gene phylogeny published in [16] (see Figure 1 for phylogenetic placement). 
Genetic features and sequencing methodologies used for each strain are given in Table 1 . 



Table 1. Bacterial strains and genome sequencing statistics. 



Strain/ 
Replicon 


Group * 


CI 

Group 


Geographical 
Origin 


ND/NI 


Sequencing 
Status 


Nb 

Contigs 


Size (bp) 


Accession 
Number 


Photosynthetic Bradyrhizobium clade 












ORS278 


VIII 


3 


Senegal 


NI 


Complete 


1 


7456587 


NC_009445 


BTAil Chr 


Sep 


3 


USA 


NI 


Complete 


1 


8264689 


NC 009485 


BTAil PI 




Complete 


1 


228826 


NC_009475 


ORS375 


III 


3 


Senegal 


NI 


454 + Solexa 


497 


7909110 


CAFI0 10000 
01-497* 


STM3809 


X 


3 


F. Guiana 


NI 


454 + Solexa 


803 


7391986 


CAFJ0 10000 
01-803 & 


ORS285 


VI 


2 


Senegal 


Both % 


454 (30X) 


301 


7632258 


CAFH0 10000 
01-301 & 


STM3847 


VII 


3 


F. Guiana 


NI 


Solexa 20X 


152475 


10121035 § 


ERP000868 $ 


STM4509 


II 


3 


Mexico 


NI 


Solexa 20X 


190139 


9206235 § 


ERP000868 $ 


STM4523 


Sep 


3 


Mexico 


NI 


Solexa 20X 


158371 


8610503 § 


ERP000868 $ 


Non-photosynthetic strain 












CAFK0 10000 
01-350* 


STM3843 


XI 


3 


F. Guiana 


NI 


454 + Solexa 


350 


8469730 
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Table 1. Cont. 



Strain/ 

Group * 

replicon 


CI 

Group 


Geographical ^ 
Origin 


Sequencing 
Status 


Nb 

Contigs 


Size (bp) 


Accession 
Number 


B. japonicum 














USDA110 Sep 


1 


USA ND 


Complete 


1 


9105828 


NC 004463 



Strain genomes sequenced in this study are in bold. * Species or AFLP group was defined 
according to [16]; Sep: separate group with no affiliated number; ND/NI: nod-dependent or 
nod-independent strain according to [13,16]; CI: Aeschynomene cross inoculation group according 
to [14,15]; % : ORS285 hosts both systems; § : genome size estimates from Solexa read contig 
assemblies; & EBI accession numbers; $ ENA (trace archive) accession number for solexa reads [18]. 



Presence of plasmids was checked for by PFGE analysis (not shown), as well as the search for 
replication genes (rep ABC) within each genome. All 10 genomes seem to be composed of a unique 
chromosome, except strain BTAil which also harbors a plasmid [19]. B. japonicum USDA110 
possesses the biggest genome with a length of 9,105,828 bp, and Bradyrhizobium. sp. STM3809 the 
smallest with 7,391,986 bp (Table 1; when excluding Solexa draft genomes for which size inference is 
uncertain). Bradyrhizobial genomes are GC rich bacteria with an average content of 65% (G + C). The 
protein coding density has an average of 86% and the number of CDS fluctuates between 6,750 (ORS278) 
and 9,650 (B. japonicum). Strains host between 3 and 6 rRNA genes in 1 to 3 operons, 
and 47 to 52 tRNA genes were identified in the various strains (Table 2). Codon and amino-acid usage 
is identical within bradyrhizobia (Supplementary Figure SI), which can be explained by the same GC 
genome content (64%-66%) and their phylogenetic proximity. 



Table 2. Genomic features of bradyrhizobial genomes. 



Strain 


GC% 


CDSN 


CDSL 


IGR (bp) 


PCD (%) 


rRNA 


tRNA 


MscRNA 


NRR (%) 


ORS278 


65.51 


6748 


952.1 


180.23 


85.50 


6 


50 


10 


8.76 


BTAil chr 


64.92 


7466 


959.42 


176.72 


85.59 


6 


52 


12 


9.33 


BTAil pi 


60.71 


257 


800.2 


186.02 


79.42 








4.75 


ORS285 


65.23 


6848 


955.92 


184.32 


85.41 


4 


49 


11 


10.45 


ORS375 


65.49 


7348 


921.28 


182.92 


84.99 


3 


52 


11 


10.22 


STM3809 


66.18 


7142 


879.79 


182.97 


84.19 


3 


47 


13 


9.42 


STM3843 


63.3 


8399 


878.57 


159.54 


86.30 


3 


48 


10 


5.10 


BjUSDAllO 


64.06 


9648 


862.74 


135.17 


88.98 


3 


50 


3 


9.53 



Abbreviations used: CDS: Coding Sequences; CDS-N: CDS number; CDS L: mean CDS Length; 
PCD: Protein coding density; Msc: Miscellanous; NRR: Nosferatu Repeated Region; bp: base pair. 



2.2. Core, Dispensable and Strain-Specific Genes 

A comparative analysis of gene content was undertaken to classify genes into groups depending on 
their degree of orthology/conservation among strains, following a bidirectional best hits approach 
(see Experimental Section). This analysis was performed on seven genomes (fragmented Solexa 
genomes were not included). First, genes present in strictly all seven genomes were pooled in a group 
called "core-genome". Then, genes identified in only one genome were pooled in groups "specific" to 
each strain. All other genes, present in a least two genomes (but not in all), were pooled in a group 
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called "dispensable genome". Finally, the pangenome includes all genes (core, dispensable and 
specific). This analysis is illustrated by a Venn diagram presented in Figure 2A. 

Figure 2. (A) Comparative gene orthology between seven genomes of Bradyrhizobium. 
Numbers indicates the shared gene sets between groups of strains, satisfying the following 
criteria: BBH (bidirectional best hits) with 40% amino-acid identity on 80% of the smallest 
protein; (B) Core and pangenome gene count in the Bradyrhizobiaceae family. CDS 
indicates the number of protein-coding genes in each genome listed in x-axis. The y-axis 
indicates the number of genes in core and pangenome when adding genomes in the x-axis 
to the comparative analysis. The presented tree was built by Maximum likelihood 
(GTR + I + G model) on a partition of 5 taxonomic marker genes (atpD, dnaK, recA, rpoB, 
rpoD), with bootstrapped nodes (100 replicates). Japonicum/Bj: B. japonicum USDA110; 
Rpal: Rhodopseudomonas palustris; Nwi: Nitrobacter winogradskii; Nham: Nitrobacter 
hamburgensis; Azo: Azorhizobium caulinodans. 

A B 

25000 I 
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The core genome of bradyrhizobia (i.e., genes shared by all seven strains) is composed of 3,663 genes, 
of which 67% have an annotated function, 32% encode conserved proteins of unknown function, and 
only 1% are hypothetical proteins (see Supplementary Figure S2A). This core gene set included 
all 206 genes from the minimal bacterial gene-set described in [20]: genes encoding DNA replication 
or repair machinery (DNA polymerase subunits, gyrase or helicase), transcription and translation 
enzymes and factors (RNA polymerase, amino-acids-tRNA synthases, ribosomal proteins, etc.), 
transport (PTS system), energy and intermediary metabolism (ATP synthase, acyl-CoA synthase, 
glycolysis enzymes). The dispensable genome was the largest group with 9,395 genes, reflecting a 
high rate of horizontal gene transfer and the metabolic versatility among bradyrhizobia. It was 
composed of 8% of hypothetical proteins, 35% of conserved proteins of unknown function, and 65% of 
proteins with functional annotation (statistics almost identical to the core genome annotation). The 
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specific gene set of each genome ranges from 489 to 3,268 genes, and was directly related to the 
phylogenetic distance between one strain and its closest neighbors. For example, a strong correlation 
was found between the phylogenetic distance of Bradyrhizobium strains towards ORS278 (based on a 
distance matrix of the 6 genes used in MLSA for Figure 2B) and their number of specific genes 
compared to ORS278 (R = 0.87). B. japonicum USDA110 is the most distant from other strains 
(harboring 3,268 specific genes) followed by STM3843 (1,892 specific genes), that is at an intermediate 
phylogenetic distance between USDA1 10 and the photosynthetic strains clade (Figure 2B). 

Photosynthetic strains harbors between 489 and 1,002 specific genes. BTAil genome exhibits the 
largest number of specific genes, of which 125 are located on its plasmid (pBRAB). pBRAB 
contains 296 CDS, meaning that 171 (57%) of these plasmidic genes are chromosomal in other strains, 
but they were not arranged in groups with obvious synteny to the chromosomes. Among the strain-specific 
gene sets, most genes were annotated as hypothetical protein (from 37% to 66% depending on strain) 
or conserved protein of unknown function (4% to 27%; with orthologs founds in RefSeq or Uniprot 
databases, see Supplementary Figure S2A-C), but several functions of ecological interest could be 
identified. For example, the ORS278 specific gene set encodes gas vesicle proteins (used by 
cyanobacteria to adjust their position in water depending on light conditions [21]) and RTX toxins 
(gram-negative exotoxin); the BTAil one encodes transporters involved in heavy metal resistance, a 
type IV secretion system, uptake hydrogenases (Hup/Hyp proteins, recycling of hydrogen produced 
during the nitrogen fixation process in legume nodules, [19]), a siderophore, an additional ATP 
synthase; the ORS285 specific gene set encodes several non-ribosomal peptide synthetase, a 
glycopeptide antibiotic (Tyrocidine synthetase 3); ORS375 contains specific genes encoding enzymes 
involved in anaerobic benzoate degradation pathway (AliB), gas vesicle proteins (GvpF/L), arsenic 
resistance; a bacteriophytochrome (BphB); STM3809 harbors specific genes encoding enzymes 
involved in catechol (CatABCD) and aromatic acid degradation (XylY, XylZ, BenA, BenD). In the 
case of STM3843 (1,892 specific genes), it was possible to identify functions involved in vitamin 
biosynthesis (Bl, B2, B12), fumarate catabolism, urease metabolism, many adenylate/guanylate 
cyclases (regulation of AMPc/GMPc secondary messengers), rhizopine catabolism (moc genes), 
fermentation of carbohydrates (pyruvate pathway-porAB genes), biosynthesis of LPS and EPS 
(exo genes, O-antigen synthesis), and several toxin/antitoxin (TA, as YoeB-YefM) modules involved 
in bacterial stress response and/or stabilization of chromosomal integrons [22]. 

Each group of genes was functionally annotated according to the KEGG classification and 
distribution of functional categories of genes were compared between strain-specific, dispensable and 
bradyrhizobial core genes (shown in Supplementary Figure S2B,C). In general, specific and 
dispensable gene sets exhibited the same KEGG distribution, both differing highly in proportions from 
the core genome for several functions such as "xenobiotics degradation and metabolism" and 
"membrane transport" (mainly ABC transporters and secretions systems; see Supplementary Figure S2B) 
that are over-represented in the specific and dispensable gene sets, or cell routine functions 
(translation, protein folding, nucleotide metabolism) that are almost specific to the core gene set. 

A distribution of the core and pangenome at different taxonomic levels (PB clade, Bradyrhizobium 
genus, Bradyrhizobiaceae family) is shown in Figure 2B. The core genome of the PB clade 
(including 5 photosynthetic bradyrhizobia) is composed of 4,792 genes, which decreases to 4,208 for 
Aeschynomene symbionts (PB clade + STM3843; excluding Bj USDA1 10), then to 3,663 {Bradyrhizobium 
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genus), 1,632 (Bradyrhizobiaceae family, including Rhodopseudomonas and Nitrobacter genomes), 
and finally drops to 1,212 after inclusion of an out-family alphaproteobacteria (Azorhizobiuni). The 
pangenome gene number increases dramatically at each addition of a new bacterial genome, 
reaching 12,040 genes at 5 PB strains, with an increased rate when shifting taxonomical borders 
(13,898 with STM3843, 16,488 with USDA110, ending at 20,162 for the Bradyrhizobiaceae family). 
It does not seem to reach a plateau even after addition of 5 PB genomes, reflecting the large genomic 
diversity within this clade. By comparison, the core and pangenome of 5 Rhodopseudomonas strains 
(Bradyrhizobiaceae family, closely related to bradyrhizobia and belonging to at least 4 several species) 
contain 3,319 and 8,000 genes, respectively [23], around 30% less than the Brady rhizobium genus. 

2.3. Whole Genome Comparative Analyses and Distribution of Genomic Islands 

Whole genome alignments were performed on the basis of a reference genome (see Experimental 
Section). Figure 3 presents whole genome alignments based on ORS278 (3A), ORS285 (3B), 
STM3843 (3C) and USDA110 (3D) with 9 genomes of Brady rhizobium (other genomes and larger 
genome views are available as Supplementary Figure S4). This representation allows inclusion of 
complete, draft and even highly fragmented genomes (obtained by Solexa technology). 

The first result obtained from Figure 3 is the high genome homology between strains from the PB 
clade (darker blue layers on Figure 3A,B), while less apparent when STM3843 (Figure 3C) or 
USDA110 (Figure 3D) were used as reference genomes (lighter blue circles around the reference 
genome). Interestingly, when B. japonicum USDA110 was used as the reference genome (Figure 3D), 
the 680 kb symbiotic island (localized between 1.7 and 2.3 Mb in USDA1 10 genome, integrated into a 
val-tRNA gene) did not show any match with the Aeschynomene symbionts genomes, except for nif 
and fix genes for all strains (encoding the symbiotic nitrogen fixation system), the nodABCIJ and a 
type 3 secretion system for ORS285 (that use both ND and NI systems). However, circular views also 
allow the detection of various "small" GI, exemplified by holes in the blue layers around the reference 
genome. As shown in Figure 3, many GI were found among genomes, some being shared between 
strains. This analysis was further explored using RGP finder (see Experimental Section), and all 
detected GI were numbered and their distribution among bradyrhizobial genomes was investigated 
(except on the 3 solexa-based genomes). Supplementary Table SI presents the listing and distribution 
of 613 detected GI (see Mat&Methods section for details), with associated functions and distribution in 
bradyrhizobial genomes. In PB genomes (5 genome-based), 278 GI were found (53 to 78 GI per 
genome), of which 33 (11%) host phage-related functions, 14 (6%) carry conjugal transfer genes, 
and 45 (16%) carry transposases or fragments of transposases. Overall, these GI carrying mobile 
elements represents 27% of PB GI -hosted genes. 

A dendrogram based on the presence/absence of GI was built, and compared to a phylogeny based 
on the core genome set (Figure 4). Interestingly, both trees were slightly discordant with a different 
position of STM3809 within the PB clade that seemed correlated in the GI -based dendrogram with the 
geographical origin of strains (STM3809 and BTAil are of American origin, while ORS strains are of 
West African origin). In both trees, STM3843 and B. japonicum USDA110 diverged from the PB 
clade, reflecting their high taxonomic distance to the PB clade. 
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Figure 3. Circular view of whole genome alignments of bradyrhizobial genomes. Genomes 
were aligned to a reference genome [for (A) ORS278, (B) ORS285, (C) STM3843, 
(D) USDA1 10)]. The localization of aligned genomes is given as a list from inner to outter 
circles in the center of each figure. Legend (from inner to outter layer of circles: genome 
scale (kb), GC deviation (gray), GC skew (orange), aligned genomes (aligned parts are in 
blue, each layer is a different genome, strains are listed close to the reference genome name 
in the center of each figure, from inner to outer layers), GI (purple), core genome (red), 
specific genes to reference genomes (pink). Higher resolution images for each genome 
analysis are available in Supplementary Figure S4. 
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Figure 4. Comparison of core-genome based phylogeny and a dendrogram based on GI 
distribution across bradyrhizobial genomes. 
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2.4. Metabolic Network Comparison of Strains 

A principal component analysis was performed to describe the completion (measured as a 
percentage) of all known metabolic pathways present in the seven Bradyrhizobium genomes in order to 
find metabolic specificities or common pathways among several strains. This analysis was associated 
with the symbiotic and photosynthetic phenotypes of strains. The three first factorial axis (Figure 5, 
Supplementary Figure S5) captured over 80% of the total data variability. The first factor (Factor 1) 
isolates metabolic specificities of B. japonicum USDA110 against other strains, Factor 2 captures 
variability of STM3843 compared with the others, and the third axis (Factor 3) separates BTAil from 
the other genomes. 

Strain segregation in the PCA plots can be explained by the metabolic differences which are larger 
in USDA110, STM3843 and BTAil compared to other strains. On the first factorial plane (Figure 5), 
64 pathways more complete in B. japonicum are represented by red vectors pointing in its direction, 
indicative of many metabolic specificities of this strain compared to others (biosynthesis of various 
molecules from carbohydrates to amino-acids, degradation of many xenobiotics, see Supplementary 
Table S2). Pathways more complete in STM3843 (separated from others on second factorial plane, 
vector group 6, 20 pathways) included metabolic enzymes involved in biosynthesis and degradation of 
various substrates. Specific metabolic pathways for photosynthetic strains (vector 4) included 13 pathways 
with known roles in photosynthesis (bacteriochlorophyll a biosynthesis, phytyl diphosphate 
biosynthesis). Finally, 60 pathways more complete in nod-independent strains (First factorial plan, 
vector 2 and 7) separated them from nod-dependent ones. These pathways included many degradation 
enzymes (of molecules such as atrazine, nitrobenzene, purine, pyrimidine) and biosynthesis enzymes 
(for products as phycocyanobilin). 
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Figure 5. Principal Component Analysis (PCA) of seven Bradyrhizobium genomes, 
performed on a two dimensional matrix compiling metabolic pathway completion values 
(the number of enzymatic reactions with coding genes for pathway x in a given organism, 
divided by the total number of reactions in pathway x as defined in the MetaCyc database; 
see Experimental Section) across all genomes. In the plot, each genome is represented by a 
point, whereas pathways are shown as colored vectors. Pathways with correlated 
completion values across organisms (vectors with similar orientation) have been clustered 
(corresponding to numbers 1 to 7) and drawn in the same color (vectors of pathways with 
identical completions overlap). Genomes can be associated with their representative and 
characteristic groups of metabolic pathways (i.e., vectors "pointing in their direction"). The 
corresponding pathway functions are listed in Supplementary Table S2. Vector length 
encodes the quality of representation of the pathway in the presented plot (i.e., 
the longer a vector is, the more representative its pathway is for genomes on the same side 
of the plot). Abbreviations: photo-/photo+: non-photosynthetic/photosynthetic strains; 
Nod+/Nod-: nod-dependent/nod-independent strains). 
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2.5. Functional Exploration of Genomes: Focus on Photosynthetic and Symbiotic Abilities 

Orthologs within dispensable gene groups were searched for among strains sharing the 
photosynthetic/non-photosynthetic and nod-dependent/nod-independent phenotypes (strain using 
nod genes to develop their symbiosis with the plant host or not). The phylogenetic profile 
("phyloprofile" for short) exploration of genomes is presented in Figures 6A (photosynthesis) and 6B 
(symbiosis). A KEGG functional classification of specific genes for each phenotype is presented in 
Supplementary Figure S3. 
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Figure 6. Phyloprofile exploration of genomes according to photosynthetic (A) and 
nodulation (B) abilities. 




2.5.1. Photosynthetic Versus Non-Photosynthetic Strains 

The specific gene set of PB is composed of 269 genes, of which 50% were annotated with functions 
known to be involved in bacterial photosynthesis (see Supplementary Table S3 for the complete listing 
of genes), 64 (23%) were annotated as hypothetical proteins, and 74 (27%) as conserved proteins of 
unknown functions. Supplementary Table S4 presents a list of photosynthetic strain-specific gene set 
with annotated functions, sort by functional roles. 

Bacterial photosynthesis has been shown to play an important role during stem symbiosis with 
Aeschynomene plants. It has been proposed that during the early steps of symbiosis, the energy 
provided by the photosynthetic apparatus facilitates ex planta survival and infectivity whereas during 
the later steps this energy can be used by the bacteria to fix nitrogen, thus limiting the demand for plant 
photo synthetates [10]. All the genes necessary for the formation of the photosystem (PS) are clustered 
in a 50 kb-region, the so-called "photosynthesis gene cluster" or PGC that includes genes found in 
the 269 specific gene set of PB: (i) bch genes involved in bacteriochlorophyll biosynthesis, (ii) crt 
genes involved in carotenoid biosynthesis (spirilloxanthin), (iii) puc and puf genes which encode 
respectively the core proteins of the reaction center (RC) and for the light-harvesting complexes I 
(LHI), and (iv) several regulators (bacteriophytochrome and ppsR genes). The organization of the 
genes in the PGC is perfectly conserved in all the photosynthetic Bradyrhizobium strains examined 
during this study (ORS278, BTAil, ORS285, ORS375 and STM3809) whereas none of these genes 
can be found in the non-photosynthetic Bradyrhizobium strains (USDA110 and STM3843). It is 
hypothesized that the ancestor of the Bradyrhizobium japonicum and B. elkanii clades was a 
photosynthetic free-living bacteria that later adapted to soil conditions and acquired nodulation genes, 
leading subsequently to a loss of the photosynthetic character due to a light-devoided habitat [11]. The 
absence of any hallmark of acquisition of the PGC by lateral gene transfer (absence of insertion 
sequences, GC% nearly identical to the whole genome, no GC skew deviation), a phylogeny of pufL 
being nearly identical to the one of the core genome (not shown), and the fact that Bradyrhizobium 
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species are closely related with the photosynthetic Rhodopseudomonas palustris species sustain 
this hypothesis. 

The synthesis of the photosynthetic apparatus is under the control of an original regulatory 
mechanism that has been described in ORS278 and BTAil strains [24,25]. Its formation occurs only 
under far-red light illumination and semi-aerobic conditions and involves a control by a light receptor 
(the bacteriophytochrome iMBphPl) and two transcriptional factors of the PpsR family (PpsRl which 
is redox sensitive acts as an activator whereas PpsR2 acts as a repressor and its activity is modulated 
by 5rBphPl). These three regulators are encountered in the three other photosynthetic Bradyrhizobium 
strains, suggesting that this unusual mechanism of regulation of PS synthesis by light and oxygen 
tension is a typical feature of this bacterial group. This regulatory circuit appears to be perfectly 
adapted to promote PS synthesis during stem symbiosis [24]. 

Unlike ORS278 strain, the PS apparatus of BTAil strain contains additional peripheral light-harvesting 
complex (LH2) encoded by a pucBA operon. The expression of this operon is also under the control of 
a bacteriophytochrome (BBta_3080) found at the vicinity of the pucBA operon [26]. Interestingly, this 
gene organization is found in two of the other three photosynthetic strains (ORS375 and STM3809) 
that were studied, suggesting that the presence of LH2 complexes is rather frequent in PB strains. 

Several bacteriophytochromes genes have been detected in photosynthetic Bradyrhizobium strains 
(2 to 4). In particular, one bacteriophytochrome (BRADO2008) is encountered in all PB as well as in 
the non-photosynthetic strain STM3843. Up to now, no function for this bacteriophytochrome has been 
identified and no genes in relation with light could be found at its vicinity. We suspect that this 
function is not in relation with photosynthesis as it is also found in the non-photosynthetic strain 
STM3843 and no alteration in the photosynthetic activity of an ORS278 mutant was detected (data not 
shown). Some PB strains possess specific bacteriophytochromes. The ORS278 strain contains the gene 
BRAD01262, located on a Horizontally Acquired Island (HAI) together with genes involved in 
phycocyanobilin (PCB) and gas vesicle syntheses [27]. This bacteriophytochrome binds PCB as a 
chromophore instead of biliverdin and displays very atypical spectral properties [27]. The ORS375 
strain also possesses a specific bacteriophytochrome (BRAO375vl_4770021) hosted on a HAI, though 
no functional data are yet available for this bacteriophytochrome. 

The relatively high number of bacteriophytochromes found in PB strains (2 to 4, while 1 in 
STM3843, and none in B. japonicum USDA1 10) indicate that these bacteria are well equipped to cope 
with a luminous environment, which is probably related to their ability to occupy an ecological niche 
exposed to sunlight (aquatic environment and phyllosphere). This niche is different from those of 
"classic" rhizobia, which inhabit soil and usually lack any bacteriophytochromes. 

Photosynthesis can be defined as the biological process by which organisms make carbohydrates 
from CO2 using energy captured from sunlight. The functional relationship between the photosystem 
and the Calvin Benson Bassham cycle (CBB) is also found at the genetic level. Indeed in all PB strains 
studied, a ebb gene cluster is found contiguous to the PGC. Interestingly, whereas the PGC has been 
lost (or is absent) in the non-photosynthetic strains (USDA110 and STM3843), a ebb gene cluster is 
found in perfect synteny in photosynthetic strains. These ebb genes are surely functional as it has been 
clearly demonstrated that B. japonicum is able to fix C0 2 in presence of Hydrogen [28]. The 
maintenance of a functional CBB cycle might be the result of a selective advantage but its role remains 
unknown. Recently, it has been described in the ORS278 strain that Rubisco, the key enzyme of the 
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Calvin cycle, is required for an efficient symbiosis with A. indica [29]. It has been proposed that the 
primary role of the CBB cycle is not to produce biomass but to act as an electron sink driving the cell's 
excess electrons to CO2 and thus oxidizing reduced factors. This role might be critical in the early 
steps of symbiosis when oxygen tension in the nodule becomes limiting and the nitrogenase activity is 
not yet established. In such conditions, the functioning of the CBB cycle might help deal with the pool 
of reduced cofactors produced by the Krebs cycle. 

2.5.2. Focus On Nod-Dependent (ND) and Nod-Independent (NI) Symbiotic Strains 

Since our genome set included strains exhibiting ND and NI symbiotic phenotypes, we looked at 
specific gene sets present only in nod-independent strains, by subtracting the B. japonicum USDA110 
gene set from the core genome of all Aeschynomene symbionts (ORS278, ORS285, BTAil, ORS375 
and STM3809). We hypothesized that all NI strains possess conserved genetic bases to establish their 
symbiotic interaction. This analysis revealed 446 genes specific to NI strains (Figure 6B). This 
relatively high number of genes is probably related to the close phylogenetic distance between the PB 
strains (thus sharing many metabolic pathways due to their common ecological niche) and their 
monophyly. This gene set was also explored in solexa-based draft genomes (STM3847, STM4509 and 
STM4523) using tblastn searches, and 62 genes were found absent among the 3 genomes, though their 
absence could be the consequence of the fact that these genomes aren't completely assembled. These 
absent genes encoded mainly hypothetical or conserved hypothetical proteins. 

The functional annotation of the 446 genes (see Supplementary Figure S3 for KEGG classification 
and Supplementary Table S5 for a complete listing) showed 104 hypothetical proteins (23%), 
125 conserved proteins of unknown function (28%), and 217 (49%) genes with annotated functions of 
which 23 had a gene name. Several functions putatively involved in symbiosis were found and are 
listed in Table 3: 

(i) Regulatory genes of which many are two-component regulatory proteins (9 genes with 
sensor/histidine kinase domains), diguanylate cyclase (BRADO0188, BRADO1402, BRADO6510), 
circadian clock proteins KaiB and KaiC involved in circadian rythms (BRAD01478 and 1479); 

(ii) Catabolic enzymes: catabolic pathways for protocatechuate (BRAD02235 to 2343, 
BRAD02378-2379) and vanillate degradation (BRAD01851), two compounds derived from lignin 
monomers [30] (lignin being an abundant constituent of plant cell walls); NUDIX hydrolases that 
hydrolyse a wide range of organic pyrophosphates and reflect the metabolic complexity and adaptability 
of a given organism [31]; various catabolic enzymes especially in Taurine degradation; 

(iii) Af-acetyl-glucosamine modifying enzymes as UDP-A L Acetylglucosamine 1-carboxyvinyltransferase 
(BRAD06639), with some proteins containing TPR repeats (BRAD04235, BRAD07123); 

(iv) Polysaccharide & glycan biosynthesis with 2 specific NI gene regions: BRAD04794-4814 with 
enzymes involved in wall techoic acid biosynthesis, an anionic polymer that decorate the 
peptidoglycan layers of many Gram-positive organisms; and BRAD05 154-5204 located on a GI in all 
PB strains; 

(v) Biosynthesis of various molecules including carotenoids (BRADO0659), Indole-3 -acetic acid 
(IAA) (BRADO7016), antibiotics (BRAD01346), cobalamin (B12 vitamin, BRAD04898-4917), and 
a type 3 polyketide synthase (BRADO0151); 
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(vi) Nitrogen fixation as nijV (BRADO5390), encoding an homocitrate synthase responsible for 
ex planta N2 fixation ability [32], as well as several nitrogen- fixation related genes (BRAD05414, 
5416, 5418, 5424) clustered in an operon conserved among all NI strains as well as in Azorhizobium 
caulinodans ORS571, but not in rhizobia as B. japonicum USDA110, S. melioti 1021 or M. loti 
MAFF303099 (that do not fix nitrogen in the free-living state). Interestingly, this gene cluster is 
up-regulated in A. caulinodans ORS571 bacteroids on Sesbania rostrata stem nodules, indicating a 
probable specific role in symbiotic nitrogen fixation in these symbionts; 

(vii) Carbon monoxide oxidation (BRADO6025-6032), that catalyzes the reversible conversion 
between carbon dioxide (CO2) and carbon monoxide (CO), and found in many marine bacterial genomes; 

(viii) Chemotaxis: several methyl-accepting chemotaxis proteins (receptor/sensory transducer); 

(ix) LuxI-LuxR quorum sensing with BRADO0941 involved in the biosynthesis of a cinnamoyl-HSL 
(Homo Serine Lactone), an unusual aryl-HSL [33]; 

(x) Transport: General secretion pathway protein (BRAD06338-type II, BRAD06341, 6344), 
HlyD-family protein (BRAD07122). 

Some NI candidate genes were mutated by directed Tn5 insertions (BRADO7016-IAA 
biosynthesis, BRADO0151-type 3 polyketide synthase, and BRADO3025-l,4-alpha-glucan branching 
enzyme) but none of the obtained mutants were found to be altered in their symbiotic performance 
suggesting that these determinants are not critical for the Nod- independent process [34]. 

Localization in GI was also investigated for the NI specific gene set. We had anticipated that the NI 
symbiotic pathway genes could be carried by a genomic island conserved among the NI strains. No 
signature of a conserved large symbiotic island (as in the USDA110 SI that harbored 60% of 
transposases of the whole genome [17]) could be identified among the NI strains. Unfortunately, the 
RGP search engine looked at various signatures of DNA (GC skew, tRNA) but focused on specific 
regions within one genome compared to others, thus excluding a genomic island if shared by all (or 
almost all) genomes. We thus mapped our gene set to the HAIs detected by DNA signatures (GC skew, 
tRNAs, Insertion Sequences) in our previous study on ORS278 [13]. Twenty genes were found 
harbored by HAIs, and are labeled in bold in Table 3. Most genes belonged to two HAIs in ORS278 
(BRADO5154-BRADO5204, and BRADO4807-4814), and were also located in HAI in all NI strains. 
These genes are involved in the biosynthesis of polysaccharides (EPS, LPS), a class of molecules 
known to play important roles in legume invasion during symbiosis (attachment to roots, biofilm 
formation, or interaction with plant defenses) [35]. 

To obtain further insight into the bacterial genes involved during the early steps of the Nod-independent 
symbiosis, a blind approach consisting in the screening of a large Tn5 mutant library (25,000 mutants) 
of the Bradyrhizobium ORS278 strain for their inability to induce nodules on Aeschynomene plants has 
been developed [12,13]. No strict nodulation-deficient mutants were found, but several mutants 
severely affected in nodule organogenesis were obtained. Among them, a mutant affected in a putative 
transcriptional factor of the TetR family (BRAD03272) was the only one to belong to the 446 NI 
specific gene pool. Interestingly, two upstream genes (BRAD03274 and BRAD03275) encode 
proteins of unknown functions and are also NI specific. Further studies are in progress in order to 
understand the role of this gene cluster during the NI symbiotic process. Apart from the BRAD03272 
gene, no gene overlap could be found between mutants altered in symbiotic abilities and the 
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comparative genomics approach. This result can be partially explained by the high redundancy of 
the 446 nod-independent specific gene set, of which 48% possess paralogs (>40% amino acid identity) 
in ORS278 (Supplementary Table S4). Symbiotic signals in classic rhizobia (Nod factors) and 
endomycorhizal fungi (Myc factors) are /Y-acetyl glucosamine derivatives [36-38]. Enzymes involved 
in Glucosamine modifications were detected in the 446 genes set, as BRAD06639, encoding an 
UDP-Af-acetylglucosamine 1-carboxyvinyltransferase, and BRAD04235/BRAD07123, two homologs 
encoding a putative O-iV-acetylglucosamine transferase related protein with TPR domains. These 
genes are also good candidates for future directed mutagenesis studies. 

Table 3. List of specific annotated genes in nod-independent bradyrhizobia. Genes in bold 
are located in HAIs detected in ORS278. 



CDS labels 



Gene 



Product 



Regulation 

Two component regulatory systems 

BRAD03895 

BRAD04688 

BRAD04689 

BRADO5320 

BRAD06874 

BRADO7009 

BRADO7010 

BRAD05682 

BRADO1407 

BRADO5100 

Transcriptional Regulators 

BRAD03272 

BRAD03678 

BRAD05987 

Cyclases 

BRADO0188 

BRADO1402 

BRAD02821 

BRADO6510 

Circadian clock operon 

BRAD01478 

BRAD01479 

BRADO1480 

BRAD01481 



kaiC 
kaiB 



putative two-component system, response regulator receiver 

putative two-component system sensor protein with Hpt domain 
putative sensor histidine kinase with a response regulator receiver 
domain 

putative sensor histidine kinase (PAS & response regulator receiver) 

putative two component system, regulator receiver (CheY-like protein) 

putative response regulator receiver (CheY-like protein) 

putative sensor histidine kinase (receiver & phosphotransferase domains) 

putative two component sensor histidine kinase 

putative Two-component system histidine kinase 

Putative Two-component sensor histidine kinase 

putative transcriptional regulatory protein, TetR/AcrR family 
putative transcriptional regulatory protein, TetR family 
putative transcriptional regulator, TetR family 

putative diguanylate cyclase (GGDEF) domain 

putative diguanylate cyclase (GGDEF) 

putative Adenylate cyclase with a CHAD domain 

putative diguanylate cyclase with GGDEF and EAL domains 

circadian clock protein kinase kaiC 
circadian clock protein 

putative signal transduction histidine kinase with PAS/PAC domains 
putative response regulator receiver (CheY-like protein) 



Catabolism 

Plant wall degradation 

BRAD01851 

BRAD02335-2343 

BRAD02379 

BRADO2380 

BRAD02382 



vanB vanillate O-demethylase oxidoreductase (Vanillate degradation) 
protochatechuate transport and degradation (ligJABC) 
protocatechuate 4,5-dioxygenase (4,5-PCD), alpha chain 
2,3-dihydroxyphenylpropionate 1 ,2-dioxygenase 

putative transcriptional regulator, PadR-like family 
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Table 3. Cont. 



CDS labels 



Gene 



Product 



BRAD04665 

Various Catabolism/Detoxification 

BRADO0718 

BRAD01842 

BRAD01843 

BRAD01844 

BRAD01848 

BRAD01849 

BRADO4030 

BRADO4031 

NUDIX hydrolase 

BRADO3028 

BRAD03892 

BRAD04664 



hpcB homoprotocatechuate 2,3-dioxygenase 

putative intradiol ring-cleavage dioxygenase 
putative amine oxidase 
putative ATPase, AAA family 
putative Glutathione S-transferase 

putative thiosulfate sulfurtransferase with Rhodanese-like domain 
conserved HP; putative oxidoreductase 

putative TauD/TfdA family dioxygenase (Taurine catabolism) 
putative dioxygenase; putative taurine dioxygenase (Taurine catabolism) 

putative MutT/nudix family protein 

putative NUDIX-like hydrolase (modular protein) 

putative NUDIX hydrolase 



Biosynthesis and modification enzymes 

Af-AcetylGlucosamine modifying enzymes 

BRAD03973 

BRAD04235 
BRAD06639 
BRAD07123 

Polysaccharide biosynthesis & glycans 

BRAD04794 

BRAD04795 

BRADO4796-4800 

BRADO4801 

BRADO4802 

BRADO4803 

BRADO4804 

BRADO4807 

BRAD04814 

BRAD05154 

BRAD05156 

BRAD05164 

BRAD05165 

BRAD05167 

BRAD05168 

BRAD05173 

BRAD05197 

BRADO5204 

BRAD06336 

Others 

BRADO0151 
BRADO0860 
BRAD01346 



putative UDP-A^-Acetylgucosamine-Peptide-A^-Acetylglucosaminyl 
transferase subunit-related 

putative (9-GlcNAc transferase related protein (TRP repeats) 
UDP-7V-acetylglucosamine 1 -carboxy vinyltransferase 
putative O-GlcNAc transferase related protein (TRP repeats) 

putative D-3-phosphoglycerate dehydrogenase 

putative carbohydrate kinase (xylulose/erythritol kinase, lyx/eryA-like) 

putative sugar ABC transporter (permease + ATP -binding) 

putative transcription regulator (EryD-like) 

putative carbohydrate kinase 

putative hydrolase (HAD superfamily) 

putative aldolase/epimerase (AraD-like) 

putative 3-oxoacyl-acyl-carrier-protein reductase 

putative TV-acetyl-mannosamine transferase (teich 

putative Capsular polysaccharide biosynthesis protein 

putative Asparagine synthetase 

putative bifunctional enzyme (sugar kinase/cytidylyltransferase) 

putative sugar-phosphate nucleotidyl transferase 

putative Phosphoheptose isomerase 

putative dTDP-glucose 4,6-dehydratase 

putative aminoglycoside phosphotransferase family protein 

putative Glycosyl transferase, group 1 

phosphoheptose isomerase (Sedoheptulose 7-phosphate isomerase) 

putative NiFe-hydrogenase/urease accessory HupE/UreJ family protein 

putative Type III polyketide synthase; putative chalcone synthase 

putative asparagine synthetase (glutamine-hydrolyzing) 

putative Phenazine biosynthesis protein 
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Table 3. Cont. 



CDS labels 


Gene 


Product 


BRADO1560 




putative aldolase 


BRAD04898 




putative cobalt-precorrin-6A synthase, cbiD (Vitamin B 12 biosynthesis) 


BRADO4900 




putative CobE protein (Vitamin B12 biosynthesis) 


BRADO4906 




putative precorrin-3B synthase (cobG) (Vitamin B 12 biosynthesis) 


BRADO4910 




putative HupE/UreJ protein 


BRAD04916 




putative cobalamin synthase (CobS family) (Vitamin B 12 biosynthesis) 


BRAD04917 




alpha-ribazole phosphatase (anaerobic pathway cobalamin 
biosynthesis) 


BKAD07U16 




indole-3 -pyruvate decarboxylase (IAA biosynthesis) 


rNiirogen iixanon 






rjKAJJUjjoy 


cysE 


serine acetyltransferase 


rJKAJJUjjyU 


mtv 


homocitrate synthase 


rJKAJJUjjyo 




putative transcriptional regulatory protein (protein cheY homolog) 


dKAUUj4Uj 




putative carboxymethylenebutenolidase(DLH) 


BRADO5410 




putative Ferredoxin, 2Fe-2S 


BRAD05411 




putative Aminotransferase, DegT/DnrJ/EryCl/StrS family 


BRAD05414 




conserved HP (NifZ domain) 


BRAD05416 




LRV protein FeS4 cluster in iV-terminus 


BRAD05417 




conserved HP; TPR repeat 


BRAD05418 




putative iron-sulfur cluster assembly protein (SufA-like) 


BRAD05424 




putative nifU protein (C-terminal fragment) 


Carbon monoxide fixation 






BRADO6025 




putative Sensor histidine kinase (HWE family) with a GAF domain 


BRADO6026 


coxF 


carbon monoxide dehydrogenase, coxF accessory protein 


BRADO6027 


coxE 


carbon monoxide dehydrogenase, coxE accessory protein 


BRADO6029 


coxL 


carbon monoxide dehydrogenase large chain 


BRADO6031 


coxM 


carbon monoxide dehydrogenase medium chain (CO-DH M) 


BRADO6032 


coxC 


carbon monoxide dehydrogenase, coxC signalling protein 


Photosynthesis & Carotenoid 






BRADO0659-0662 




carotenoid synthesis 


BRADO2007 




putative heme oxygenase 


BRADO2008 




putative bacteriophytochrome 


Quorum sensing 






BRADO0941 




autoinducer (acylhomoserine lactone) synthase 


Chemotaxis 






BRAD02926 




putative Methyl-accepting chemotaxis protein 


BRADO0678 




putative methyl-accepting chemotaxis protein 


BRADO3031 




putative methyl-accepting chemotaxis receptor/sensory transducer 


BRAD04514 




putative bacterial chemotaxis sensory transducer 


BRADO5051 




conserved HP-cheL 


Transport 






Transporters 






BRADO0677 




putative TRAP-dicarboxylate transporter (DctP subunit) 


Secretion 






BRAD06338-6344 




general secretion pathway proteins (Gspl, GspL) 


BRAD07122 




putative HlyD-family secretion protein 
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The specific gene set of nod-dependent strains (Bj USDA110 and ORS285) was also investigated 
for functions of ecological interest. This gene set was 10 times smaller (49 genes) than the 
nod-independent one (446 genes). This low number of genes reflects the high phylogenetic distance 
between USDA110 (BJ clade) and ORS285 (BP clade), as all genes in PB have been subtracted from 
the gene pool of ORS285 in this comparative analysis. This result adds support to the hypothesis of a 
transfer of the symbiotic island from the BJ clade to BP. Interestingly, if such transfers can be 
detected, they are either rare since no other genes (apart from symbiotic ones) were detected in 
common, or they are very widespread in PB and so were present in the subtracted gene pool. Among 
the 49 common genes in nod-dependent strains, two main classes of genes with a predicted function 
can be distinguished: as expected, the no d genes (nodD, nodA, nodB, nodC, nodS, nodU, nodi, nodi, 
nodZ, noel, nolA), and more interestingly, the rhc genes (also named tts) that encode core components 
of a type three-secretion system (T3SS) [39,40]. 

The nod gene equipment therefore appears very similar between the two strains ORS285 and 
USDA1 10. The only differences observed are the additional presence in B. japonicum of: (i) nolO which 
encodes a carbamoyltransferase and (ii) the two component regulatory system nodV/nodW. In counter-part, 
ORS285 possesses an additional nodL gene that encodes an acetyl transferase. In accordance with similar 
nod genes content are the facts that both strains produce the same major Nod factor (a pentameric LCO 
with a 2-0-methylfucose at the reducing end) and are able to nodulate common host plants; in particular, 
the B. japonicum strain was recently shown to induce root and stem nodules on A. afraspera [41]. 
However, in contrast to the ORS285 strain, B. japonicum failed to induce nodule on Aeschynomene 
species which are nodulated by the nod-lacking strains such as A. indica and A. sensitiva suggesting that 
B. japonicum lacks key determinants required for the Nod-independent symbiotic process [34]. 

In B. japonicum, the tts gene cluster is found on a large symbiotic island of 681 kb [17]. As these 
genes are absent in the nod-lacking strains, we can postulate that ORS285 also acquired a symbiotic 
island harboring both the nod and the tts gene clusters by a lateral gene transfer. The T3SS is used by 
many bacterial pathogens to deliver directly a cocktail of effector proteins in eukaryotic cells that will 
subvert host defense [42]. This secretory machinery has been shown in several strains of rhizobia 
(including B. japonicum USDA 110) to affect symbiosis positively or negatively depending on the 
host [43]. It could therefore be particularly interesting to examine the impact of mutations in the tts 
cluster of the ORS285 strain to infer its role in the Nod-dependent or Nod-independent symbiotic 
processes in relation to the host plant. 

3. Experimental Section 

3.1. Bacterial Strains Maintenance and Preparation of Genomic DNA 

Strains listed in Table 1 were grown on YMA [44], and bacterial genomic DNA was isolated 
following a modified CTAB protocol as recommended by the Joint Genome Institute [45]. 

3.2. Sequencing and Assembly of 7 Br adyrhizobium Genomes 

Four genomes (ORS285, ORS375, STM3809, STM3843) were sequenced with Roche 454 GS-FLX 
technology (Genoscope, CEA, France) with a coverage of 11 to 28X, and with Illumina Solexa 
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technology (GATC company, 36 bp-reads length, 20X cover). GS-FLX reads were assembled using 
Newbler software (v. 1.1.03.24), leading to 300 to 800 contigs. These contigs were fused to obtain a 
single scaffold, as a single chromosome and no plasmid could be visualized by PFGE analyses (not shown) 
and no repC homologs could be found in the gene content of each of these genomes (repC being a 
specific gene for plasmid replication in alphaproteobacteria). Solexa reads were used to correct contig 
consensus and sequencing errors. Three other genomes were sequenced only with Illumina Solexa 
technology (STM3847, STM4509 and STM4523) by the company GATC (Germany). Short 36 bp reads 
were pre-assembled with PERL script Vcake [46], using a first round of k-mer extensions of Solexa 
reads, then a second round using progressive extension of contigs obtained from the first round. This 
strategy allowed the construction of a sequence database of 100-140,000 contigs for each genome, 
subsequently used in global alignments. A last contig assembly was also built using CLC Genomics 
Workbench 4.0.2 (mismatch cost 2, Insertion cost 3, Deletion cost 3, length fraction 0.5, Similarity 0.8; 
length kept >50 bp), producing 12 to 14,000 contigs per strain, allowing gene search and phylogeny. 
Annotated genomes based on 454 GS-FLX assemblies were submitted to the EBI [47]. Solexa reads 
were submitted to the Sequence Read Archive (SAR) of the European Nucleotide Archive (ENA) 
under accession number ERP000868. Genomes are also available on the MicroScope annotation and 
comparative genomics platform [48] in the Rhizoscope Project. 

3.3. Automatic and Expert Annotation of Bradyrhizobial Genomes 

Annotation of bacterial genomes was conducted following 3 main steps: syntactic annotation to find 
genomic objects, followed by functional annotation to assign roles on the basis of sequence similarity 
with sequences from different databanks, and a relational annotation which establishes links between 
different genomic objects. All these steps were performed in the MicroScope platform pipeline and 
results are available via the MaGe (Magnifying Genomes) graphical interface [48,49]. For syntactic 
annotation, coding sequences were predicted using AMIGene (Annotation of Microbial Genomes) 
software [50]. Functional annotation used tools listed in [51] to give functions to the set of predicted 
genes. First, for strains ORS285, ORS375, STM3809 and STM3843, annotations were automatically 
transferred from strains ORS278, BTAil and E. coli K12 (in that order of priority) for genes having good 
similarities {i.e., 80% identity over at least 70% of the length of the smallest protein). These automatically 
generated results were stored in a relational database, and were further used as a starting point for 
manual validation with the MaGe web-interface, allowing data exploration, graphical visualizations 
and genomic comparisons. A functional classification was also produced for strain(-set)-specific genes 
using the KEGG database (via the KEGG Automatic Annotation Server [52]). 

3.4. Whole Genome Alignments 

Four sequenced Bradyrhizobium genomes were selected as reference. The remaining genomes were 
aligned on each reference genome using the MUMmer 3.0 whole genome global alignment system [53] 
to find conserved and organism-specific regions. In order to find potential alignment anchors, defined 
as maximal unique matches (MUMs) between two genome sequences, the MUMmer software uses 
suffix-trees. A suffix tree is a compact representation that stores all possible suffixes of an input 
sequence. MUMer then performed a protein alignment, starting by the translation of nucleic sequences 
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in all 6 reading frames. Aligned regions between the reference and another genome that did not have a 
minimal identity of 70% and a match length of at least 20 amino-acids were filtered out. The CGview 
application [54] was used for graphical representation of whole genome alignments. 

3.5. GC% Deviation 

GC% deviation was computed using sliding 1,000 bp windows and represents the ratio between 
GC percent in the window and the mean GC percent of the genome. High deviations can indicate a 
potential foreign origin of the sequence within the window. 

3.6. Core-Genome and Pangenome 

Genes from two or more genomes were considered as orthologous if their encoded proteins satisfied 
bidirectional best-hit (BBH) criteria and alignment thresholds (40% amino acid identity on 80% of the 
length of the smallest protein). BBH are commonly used in comparative genomics to infer orthologous 
genes [55-57], and are integrated in the DOE IMG and MAGE platforms [49,58]. Two genes from two 
different genomes are bidirectional best hits when each is the best hit of the other. A bidirectional best-hit 
is considered as a good evidence that the genes may be orthologs arising from a common ancestor. 

For Venn diagrams displayed in Figure 2 and Figure 6, comparison of gene sets between genomes 
were performed using the Phyloprofile Exploration tool of the MicroScope platform [48], using the 
following orthology criterion: BBH with 40% amino acid identity on 80% of the length of the smallest 
protein. Gene exploration in the 3 highly fragmented genomes (derived from solexa reads assemblies) 
was performed using tblastn searches with the Stand alone Blast software [59]. 

3. 7. Phylogenomics 

The core genome -based phylogeny was produced by aligning partitions for the 3,663 bradyrhizobia 
orthologs using ASAP software [60] (based on the Muscle alignment software [61]), and using 
PAUP4M0 to infer phylogenies on a combined partitions of all 3,663 genes (criterion set to distance, 
using Jukes-Cantor distance correction, and 100 bootstrap replicates). MLSA-based phylogeny shown 
in Figure 2B was obtained by Maximum Likelihood using a GTR + I + G model and 100 bootstrap 
replicates (using MEGA5 [62]) 

3.8. Search for Genomic Islands (GI) 

The «RGPfinder» tool of the MicroScope annotation platform [63] is a comparative genomics 
method that searches for synteny breakpoints that delimit regions of at least 5 kbp in a target genome 
that are missing in one or more other related genomes. These RGPs (for Regions of Genomic 
Plasticity) can be insertion sites of GI, or the result of deletions of particular DNA segments in one or 
more strains. Region prediction is accompanied by the characterization of several common GI features, 
such as detection of tRNAs, IS (Insertion Sequence) and repeats, as well as information about 
nucleotide composition abnormalities [% (G + C) deviation, Codon Adaptation Index]. GI from two 
distinct organisms were considered identical when they shared homologous genes (at least 40% identity 
on 80% of the length of the smallest protein) in synteny on at least half of the GI (considering its size 
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and number of genes). Correspondences between genomic islands in different Bradyrhizobium 
genomes were thus established in order to construct a binary matrix wherein each line represents a 
Bradyrhizobium genome and each column a GI. This matrix was then used to build a similarity tree 
which was compared to the phylogenetic tree of the Bradyrhizobium genomes. 

3.9. Metabolic Comparisons 

In each genome, the metabolic network was predicted by the "Pathway Tools" software [64] using 
MetaCyc [65] as a reference pathway database (version 13.0). The metabolic networks for each 
Bradyrhizobium sp. genome are directly available in the MaGe graphical interface at [66]. Using these 
data, a two-dimensional matrix was built for use in a Principal Component Analysis (with R package 
"FactoMineR"), which will highlight possible metabolic similarities and specificities between the 
genomes. For graphical representations, the variable (pathways) plot and the individual (genomes) plot 
were combined, restricting plotted variables to those with a quality of representation greater than 0.75, 
in order to conserve interpretability. The clustering method used a Euclidean distance and ward's 
criterion; the number of classes was chosen after manual examination of the cluster tree, and 
led to 7 classes for the first factorial plan (see Supplementary Table S2). 

4. Conclusions 

Bacterial comparative genomics revealed a powerful tool to analyze Aeschynomene symbionts 
genome characteristics and their ecological niche adaptation. Photosynthetic Bradyrhizobium have 
large genomes (8-10 Mb), with a rather small core genome (3,663 genes) compared to their large 
pangenome (18,040 genes). The dispensable genome and strain-specific genes (mainly located on GI) 
exhibit numerous functions of adaptation towards the special ecological niche of PB (soil/aquatic, 
light-environment, stem/root nodule). 

Our comparative genomics approach led us to the detection of specific gene pools in bradyrhizobial 
strains linked to two phenotypes of interest, i.e., the photosynthetic ability and the nod-independent/ 
nod-dependent symbiosis. Concerning bacterial photosynthesis, genome comparison reveals a clear 
ancestral origin for the photosynthetic activity. The LH2 type antennae complexes appear to be more 
widely distributed among PB that expected. The presence of a ebb gene cluster in synteny for all 
sequenced genomes suggests that preserving a functional CBB cycle confers a clear advantage. 
Finally, several new bacteriophytochromes were identified and further studies are required to explain 
their functional role. For the symbiotic process, we identified several target genes (as transcriptional 
regulators, polysaccharide biosynthesis and iV-acetylglucosamine modification enzymes or plant-wall 
degrading enzymes) that will be further studied at the genetic level in order to prove their contribution 
to the nod-independent process. Finally, transcriptomic approaches on PB during their symbiosis with 
Aeschynomeme are currently being developed to identify more precisely the genetic determinants 
involved in this atypical mutualistic interaction. 
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